Goto

Collaborating Authors

 unreasonable effectiveness


Author Response for The Unreasonable Effectiveness of Big Models for Semi Supervised Learning

Neural Information Processing Systems

We thank the reviewers for feedback, as well as efforts in reviewing. We respond to each comment below. Overall, there is no significant contribution to unsupervised pre-training. " The fact that our main contribution is a detailed procedure, rather than a theorem, architecture, or other artifact, We believe our contributions are significant. Indeed, R3 recognizes that "the simple semi-supervised framework is still I think it will inspire several future works." " While we believe ImageNet is a much more These results can be further improved with better augmentations during fine-tuning and an extra distillation step.


Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Neural Information Processing Systems

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with $k$ the number of arms and $T$ the time horizon, we consider the case where $k \geq \sqrt{T}$. We first show that {\em subsampling} is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples $\Theta(\sqrt{T})$ arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms.


The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Neural Information Processing Systems

Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.


The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings

Neural Information Processing Systems

We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation. For both the Johnson-Lindenstrauss transform and the angular kernel, we show that we can select matrices yielding guaranteed improved performance in accuracy and/or speed compared to earlier methods. We introduce matrices with complex entries which give significant further accuracy improvement. We provide geometric and Markov chain-based perspectives to help understand the benefits, and empirical results which suggest that the approach is helpful in a wider range of applications.


Review for NeurIPS paper: Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Neural Information Processing Systems

Additional Feedback: Post-rebuttal comments: I've read the rebuttal and other reviews. The authors have addressed most of my concerns and hence I increase my score. I hope the authors would make the suggested edits in the revised version and explain the role of their main assumption. Can you explain why things fail if this assumption does not hold? Can you make use of a prior (in the case it is informative)?


Review for NeurIPS paper: Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Neural Information Processing Systems

All reviewers agree that the paper considers a problem of relevance (bandits with many arms) and shows interesting results about simple-to-implement learning algorithms based on the greedy principle. However, one lingering concern that arose during the discussions among the reviewers was whether/how the results obtained in the paper applied for the case when the number of arms is larger than the time horizon of the game (k T). It appears that the author response to this question has not been substantial. Though I can see that this will not be an issue -- the proof of Lemma 2 bounds regret with respect to the best possible reward of 1, the author(s) is/are requested to add a precise clarification of this regime in the updated version.


The Unreasonable Effectiveness of LLMs for Query Optimization

arXiv.org Artificial Intelligence

Recent work in database query optimization has used complex machine learning strategies, such as customized reinforcement learning schemes. Surprisingly, we show that LLM embeddings of query text contain useful semantic information for query optimization. Specifically, we show that a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems. Although we only present some preliminary results, an LLM-powered query optimizer could provide significant benefits, both in terms of performance and simplicity.


The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Neural Information Processing Systems

Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time.


Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Neural Information Processing Systems

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with k the number of arms and T the time horizon, we consider the case where k \geq \sqrt{T} . We first show that {\em subsampling} is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples \Theta(\sqrt{T}) arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms.


The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks

arXiv.org Artificial Intelligence

Finding model parameters from data is an essential task in science and engineering, from weather and climate forecasts to plasma control. Previous works have employed neural networks to greatly accelerate finding solutions to inverse problems. Of particular interest are end-to-end models which utilize differentiable simulations in order to backpropagate feedback from the simulated process to the network weights and enable roll-out of multiple time steps. So far, it has been assumed that, while model inference is faster than classical optimization, this comes at the cost of a decrease in solution accuracy. We show that this is generally not true. In fact, neural networks trained to learn solutions to inverse problems can find better solutions than classical optimizers even on their training set. To demonstrate this, we perform both a theoretical analysis as well an extensive empirical evaluation on challenging problems involving local minima, chaos, and zero-gradient regions. Our findings suggest an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.